智能论文笔记

Recent research shows synthetic data as a source of supervision helps pretrained language models (PLM) transfer learning to new target tasks/domains. However, this idea is less explored for spatial language. We provide two new data resources on multiple spatial language processing tasks. The first dataset is synthesized for transfer learning on spatial question answering (SQA) and spatial role labeling (SpRL). Compared to previous SQA datasets, we include a larger variety of spatial relation types and spatial expressions. Our data generation process is easily extendable with new spatial expression lexicons. The second one is a real-world SQA dataset with human-generated questions built on an existing corpus with SPRL annotations. This dataset can be used to evaluate spatial language processing models in realistic situations. We show pretraining with automatically generated data significantly improves the SOTA results on several SQA and SPRL benchmarks, particularly when the training data in the target domain is small.

translated by 谷歌翻译

ArmanEmo: A Persian Dataset for Text-based Emotion Detection

Hossein Mirzaee , Javad Peymanfard , Hamid Habibzadeh Moshtaghin , Hossein Zeinali

分类：自然语言处理 | 人工智能

2022-07-24

随着社交媒体平台上的开放文本数据的最新扩散，在过去几年中，文本的情感检测（ED）受到了更多关注。它有许多应用程序，特别是对于企业和在线服务提供商，情感检测技术可以通过分析客户/用户对产品和服务的感受来帮助他们做出明智的商业决策。在这项研究中，我们介绍了Armanemo，这是一个标记为七个类别的7000多个波斯句子的人类标记的情感数据集。该数据集是从不同资源中收集的，包括Twitter，Instagram和Digikala（伊朗电子商务公司）的评论。标签是基于埃克曼（Ekman）的六种基本情感（愤怒，恐惧，幸福，仇恨，悲伤，奇迹）和另一个类别（其他），以考虑Ekman模型中未包含的任何其他情绪。除数据集外，我们还提供了几种基线模型，用于情绪分类，重点是最新的基于变压器的语言模型。我们的最佳模型在我们的测试数据集中达到了75.39％的宏观平均得分。此外，我们还进行了转移学习实验，以将我们提出的数据集的概括与其他波斯情绪数据集进行比较。这些实验的结果表明，我们的数据集在现有的波斯情绪数据集中具有较高的概括性。 Armanemo可在https://github.com/arman-rayan-sharif/arman-text-emotion上公开使用。

translated by 谷歌翻译

改善软件性能是软件开发周期中重要但充满挑战的部分。如今，大多数性能效率低下是由绩效专家确定和修补的。深度学习方法的最新进展和开源数据的广泛可用性为自动化绩效问题的识别和修补提供了一个绝佳的机会。在本文中，我们提出了Deepperf，这是一种基于变压器的方法，以建议针对C＃应用程序进行性能改进。我们在英语和源代码语料库上预告了Deepperf，然后进行了Finetuning的任务，以生成C＃应用程序的性能改进补丁。我们的评估表明，我们的模型可以在约53％的案例中生成与开发人员修复相同的性能改进建议，在我们专家验证的C＃开发人员进行的绩效更改的数据集中，逐字化约34％。此外，我们使用基准测试和单元测试在GitHub上在50个开源C＃存储库上评估Deepperf，并发现我们的模型能够提出有效的性能改进，以改善CPU使用和内存分配。到目前为止，我们已经提交了19个带有28种不同性能优化的拉装重新要求，其中11个PR已获得项目所有者的批准。

translated by 谷歌翻译